Shortest Unique Substring Query Revisited
نویسندگان
چکیده
We revisit the problem of finding shortest unique substring (SUS) proposed recently by [6]. We propose an optimal O(n) time and space algorithm that can find an SUS for every location of a string of size n. Our algorithm significantly improves the O(n) time complexity needed by [6]. We also support finding all the SUSes covering every location, whereas the solution in [6] can find only one SUS for every location. Further, our solution is simpler and easier to implement and can also be more space efficient in practice, since we only use the inverse suffix array and longest common prefix array of the string, while the algorithm in [6] uses the suffix tree of the string and other auxiliary data structures. Our theoretical results are validated by an empirical study that shows our algorithm is much faster and more space-saving than the one in [6].
منابع مشابه
Shortest Unique Queries on Strings
Let D be a long input string of n characters (from an alphabet of size up to 2 , wherew is the number of bits in a machine word). Given a substring q of D, a shortest unique query returns a shortest unique substring of D that contains q. We present an optimal structure that consumes O(n) space, can be built in O(n) time, and answers a query in O(1) time. We also extend our techniques to solve s...
متن کاملShortest Unique Substring Queries on Run-Length Encoded Strings
We consider the problem of answering shortest unique substring (SUS) queries on run-length encoded strings. For a string S, a unique substring u = S[i..j] is said to be a shortest unique substring (SUS) of S containing an interval [s, t] (i ≤ s ≤ t ≤ j) if for any i′ ≤ s ≤ t ≤ j′ with j − i > j′ − i′, S[i′..j′] occurs at least twice in S. Given a run-length encoding of size m of a string of len...
متن کاملShortest unique palindromic substring queries in optimal time
A palindrome is a string that reads the same forward and backward. A palindromic substring P of a string S is called a shortest unique palindromic substring (SUPS) for an interval [s, t] in S, if P occurs exactly once in S, this occurrence of P contains interval [s, t], and every palindromic substring of S which contains interval [s, t] and is shorter than P occurs at least twice in S. The SUPS...
متن کاملTight bound on the maximum number of shortest unique substrings
A substring Q of a string S is called a shortest unique substring (SUS) for position p in S, if Q occurs exactly once in S, this occurrence of Q contains position p, and every substring of S which contains position p and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query position p all the SUSs for position p can ...
متن کاملTight Bounds on the Maximum Number of Shortest Unique Substrings
A substring Q of a string S is called a shortest unique substring (SUS) for interval [s, t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s, t], and every substring of S which contains interval [s, t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUSs...
متن کامل